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We propose one and a half criteria for determining how many measurements are needed to quantify 
entanglement reliably. We base these criteria on Bayesian analysis of measurement results, and apply 
our methods to four-qubit entanglement, but generalizations to more qubits are straightforward. 



I. INTRODUCTION 

The study of quantum entanglement never ceases to 
intrigue researchers [I] , and its verification has attracted 
just as much attention in the quantum information com- 
munity. Almost all entanglement verification methods 
[5] are designed for the situation where infinitely many 
data are (implicitly) assumed to exist. The finite-data 
regime has not been given much attention until recently 
[3]. In that paper the main question concerned the bi- 
nary decision about whether one's quantum systems are 
entangled or not. In the present paper we consider the 
task of quantifying entanglement with finite data. One 
of the questions we consider here is: how many mea- 
surements are needed to quantify entanglement reliably? 
Obviously, such a question cannot be answered in its full 
generality, as it will depend on what measurements are 
performed, on the number of qubits, and, possibly, on 
how accurate an estimate one wishes to have. Never- 
theless, we will develop general criteria for determining a 
"sufficient" number of measurements based on a Bayesian 
analysis of measurement data, which can be applied to 
any sorts of measurements and to any number of qubits. 
Our criteria do not actually need an accuracy to be spec- 
ified in advance. 

The other goal of this paper is to develop Bayesian esti- 
mation methods for entanglement in nontrivial cases. In 
particular, we choose to simulate experiments on (mixed, 
entangled) four-qubit states. Ref. [4] discusses the virtues 
of Bayesian methods for quantum state estimation, es- 
pecially as compared to maximum likelihood estimation 
(MLE), and here we consider that same comparison in 
the context of entanglement estimation. 

An advantage of Bayesian methods is that error bars 
on entanglement measures are generated automatically. 
MLE can generate error bars by using a bootstrap 
method, where /?mle is used to numerically generate 
more data, but this does not work when the number of 
data is small. In Sec. [TT] we compare these two meth- 
ods of generating entanglement estimates and their error 
bars. The Bayesian methods do require one to choose 
prior probability distributions over states. In Sec. |III A| 
we explicitly provide two inherently different standard 
prior distributions in our systems both of which are nu- 
merically feasible and both of which can be applied to 
any number of qubits. In Sec. |IIIB| a convenient entan- 



glement measure is introduced, that can be computed di- 
rectly from the multipartite density matrices, and which 
can, likewise, be generalized to any number of qubits. 
We also briefly discuss the disadvantages of this particu- 
lar measure (no known multi-partite entanglement mea- 
sure is without flaws, although a very recent preprint 
does improve upon the situation In Sec. IIIC we 



derive the relations needed for tomographic state recon- 
struction that are associated with a special kind of to- 



mographically complete measurements, and in Sec. |III D| 
we discuss our implementation of the Metropolis-Hasting 
algorithm, which allows one to sample from the posterior 
distribution efficiently. Finally, in Sec. |IV| we give our 
main results and attempt to answer the questions laid 
out in this Introduction. 



II. SOME COMPARISONS BETWEEN 
MAXIMUM LIKELIHOOD ESTIMATION AND 
BAYESIAN METHODS 

In entanglement verification experiments where to- 
mography is adopted, maximum likelihood estimation is 
widely accepted as the state estimation method of choice. 
Here the state that best fits the data, pmle, is accepted 
as the best estimate of the quantum state. While this 
may sound almost tautological, what MLE fails to give 
credit to is a large multitude of states that are almost as 
likely as Pmle (see Ref. [1]). Bayesian methods, on the 
other hand, take these states into account naturally. We 
will briefly compare MLE and Bayesian methods for en- 
tanglement estimation, and in later sections we will get 
into more details. 

Bootstrap methods combined with MLE can be used 
to generate a distribution of states (somewhat similar to 
the Bayesian posterior distribution of states). Here one 
assumes pmle as the real state, from which new sets (of 
the same size and type as the actual data set) of simu- 
lated measurement results are generated. Each such set 
yields a new MLE state; and thus a distribution of states 
is generated, which can be used to generate error bars. 
While this distribution does take into account to some 
extent the statistical fluctuations, the final estimation of 
entanglement might still be overly optimistic (see Ref. [J] 
for a clear account). 

The fundamental idea behind Bayesian inference fol- 
lows from Bayes' theorem. Assume H is a hypothesis 
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and D is the observation data. Bayes' rule tells us that 
the probability for hypothesis H to be true given obser- 
vation data D, also known as the posterior probability, 



is 



P(H\D) 



P{D\H) 
P{D) 



P{H). 



(1) 



P{H) is the prior probability, the probability of H prior 
to the observations of D. P(D\H) is the conditional prob- 
ability for D to be observed if H is true; it is also called 
the likelihood of the hypothesis given the data; and then 
it is denoted by C(D). P(D) is the marginal probability 
for data D, which is usually considered as a normalization 
factor, namely, as the sum of the conditional probabilities 
over all mutually exclusive hypotheses 



P(D) = Y d P{D\Hj)- 



(2) 



In our quantum context, the role of hypotheses is played 
by density matrices p. 

To be more specific now, we will look at a four-qubit 
system on which we will perform the simulations pre- 
sented in the paper (generalizations are straightforward). 
We assume some POVM is measured that can be writ- 
ten as a tensor product of local measurements, {IL.}, 
(because that tends to be the easiest type of measure- 
ment to perform in practice). The outcomes of the 
POVM measurement can then, likewise, be denoted by 
{Hj (g n fc (g II m (g II„}, and fjkmn is the frequency of 
getting the outcome Hj (g life <g H m <g II„. The likelihood 
functional for any state p is then by definition 

£(p) = II [Tr(pn,®n fc ®n m (8n n )] M/ ^ m " 

jkmn 

= n ( PJkmn ) Mf >—, (3) 

jkmn 

where M is the total number of measurements of the 
POVM, and 



Pjkmn = Tr (pllj (g life (g II m (g n„) 



IT 



(4) 

n„, 



is the probability of the outcome Hj (g life 
given the state p. 

The (physical) state that saturates the upper bound 
of C is called pmle- Since the estimation is a single 
state, which can be considered as a distribution with 
zero width, it is equivalent to taking the limit M — > oo. 
That is, MLE by reporting a single density matrix es- 
sentially assumes that the same data would repeat ad 
infinitum. Bayesian estimation methods yield the same 
answer as MLE in that limit, and the influence of the 
prior is eliminated. In the cases where /?mle saturates 
the upper bound of C in such a way that Pjkmn 
Tr (pmleIIj (g life ® II TO (g II„) [31], pmle tends to lie on 
the boundary of the set of physical states [4]. That is, 
the state is of non-maximal rank and some eigenvalues 
are zero. This usually happens when M is "small." 
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FIG. 1: (Color online) Left: Two different prior distributions 
of states, named "Z" and "GH" (for details, see main text, 
Section [ill A 1 1 in particular), and the induced prior distribu- 



tions of the 4-party negativity Ni (defined in Section III B I 



Right: GH prior distribution over the two negativities Ni and 
N2, showing the strong correlation between the two measures 
for randomly drawn states. The "white" noise is due to sta- 
tistical fluctuations due to the finite sample size. 



The second step of MLE+bootstrap is to simulate a 
new dataset by using Tr (puLE^-j (g life (g II m (g II„) as 
probabilities of measurement outcomes. Repeating this 
procedure many times (using the same pmle) will pro- 
duce a distribution of measurement outcomes and in- 
ferred quantities, and thus error bars on those quantities. 
As the new data are generated by pmle, entanglement 
can be easily overestimated if pmle lies on the bound- 
ary of the set of physical states, since generically rank- 
deficient states are more entangled than full-rank states. 
On the other hand, if pmle is away from the boundary, 
then the distribution produced this way is expected to re- 
semble the posterior distribution generated by Bayesian 
methods. 

One interesting question is how fast the gap closes 
up between the two estimates, MLE and Bayesian, as 
the number of measurements M increases. In fact, this 
comparison will serve as a (half) criterion for determin- 
ing how many measurements is "sufficient," provided we 
choose some "standard" prior distribution to be used in 
Bayesian entanglement estimation. 



III. PRELIMINARIES 

Before we can tackle the main questions of this paper, 
we need to make several choices, and we need several 
definitions. These arc all collected in this Section. 



A. PRIORS AND MEASURES 

Let S be the set of all physical states p and /1 be a 
measure on the space of S. Particularly in probability 
theory, J s dp = 1. If / is any real function of p, then the 
expectation value of / over the space of S is specified by 
the measure p,: 



</) = / f(p)dp. 
Js 



(5) 
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If p is parameterized by a set of real parameters x: 
p = p(x), then p becomes the Lebesgue measure over 
the space of x: dp — dx, where dx is the infinitesimal 
volume in the corresponding real parameter space. The 
choice of the parametrization of state p , which essentially 
implies the choice of measure in the space of all states S, 
induces a prior, P„(/j). Namely, a uniform random dis- 
tribution over the parameter space defines a particular 
prior distribution over the space of the physical states 
through the relation 



Pp.{p) d P = p(x)dx. 



(6) 



Thus we claim that we have, in this context, established 
the connection between the prior and the measure. In 
numerical implementations where one samples from the 
random distribution over x the integral is replaced by the 
sum: 



/ /(p(x))dx^Ax/((p(x)). 
Js 



(7) 



1. The GH and Z priors 

To study a system consisting of four qubits we choose 
two inherently different priors: Z and GH , which corre- 
spond to two distinct measures of the state space. The 
measures are chosen for their numerical convenience and 
for their extendability to arbitrary numbers of qubits. 
Moreover, they are both dense in the set of all states. 

To define the Z measure (or prior) we first write the 
density matrix for a four-qubit system as 



P : 



vevK 



(8) 



where E is a diagonal matrix that carries all the eigen- 
values and V is a unitary matrix. The measure of states 
can be chosen as a product of two particular independent 
measures introduced in [7] 



p(p) = p{E) x p(V). 



(9) 



p(E) constitutes a 15-dimensional simplex, which is a 
uniformly distributed manifold defined by a unit sum of 
16 nonnegative numbers, and p(V) is the Haar measure 
based on the direct products of four matrices, any sin- 
gle one of which is to be chosen from the set of three 
Pauli matrices and the identity. We name the prior cor- 
responding to this measure "Z prior" . 

Alternatively, we can parametrize a four-qubit state as 



p = HH ] /Tt(HH^) 



(10) 



where H is a random complex 16-by-16 matrix, with both 
the real and the imaginary part of each entry uniformly 
distributed on (—1, 1). This is closely related to Cholesky 
decomposition of the positive semidefinite matrices [? ], 
and similar to the parametrization used in Ref. [8] , except 
that in that paper the unit trace condition is imposed by 




FIG. 2: The posterior distributions resulting from a pure GH 
prior (w/o ID) and a mixed GH prior (w/ ID, i.e., with the 
identity mixed in, see text for details). From left to right, the 
curves describe single trials of just a 1000 measurements on 
the state p q (Eq.pS)) with q = 0.4, 0.6, 0.8 from left to right. 



Lagrangian multipliers while here the condition is satis- 
fied automatically. We name the prior correspond to this 
measure "GH prior" . 

Each prior over states induces a prior over any quan- 
tity that can be calculated as a function of the state. 
If we are interested in a quantity N(p), then we have a 
prior P(N)dN = (P(p)dN/dp)dp. In particular, in the 
next subsection we will define two measures of four-qubit 
entanglement, two "negativities", Ni and JVjj, that both 
can be calculated (easily) for given states. In FIG[l] we 
show the two induced prior distributions over N\ (left) 
and the GH prior distributions for Ni and N2 (right). 
From this point on we will stick to Z and GH priors for 
the demonstration of further results. Note that these pri- 
ors are not meant to represent anyone's subjective prior 
beliefs: rather they are two standard priors to be used 
for our specific purposes of quantifying entanglement and 
determining how many measurement are needed for that. 

It is worth mentioning some observations on simple 
variations of the above priors. In particular, both priors 
have the property that the weight of entangled states is 
larger than that of unentangled states (more precisely, 
we compare zero negativity states vs. nonzero negativ- 
ity states, using our definition of multi-partite negativity, 
see the following section for details). In order to achieve 
a prior distribution where the ratio of the weight of sep- 
arable states vs. entangled states is unity, we can mix 
in an appropriate amount of the identity matrix into the 
pure Z and pure GH measures. That is, after having 
picked a random state p' from either measure, we take 
p = Xp' + (1 — X)l/D, with t/D the maximally mixed 
state in the Hilbert space. A can be sampled from any dis- 
tribution that leads to a equal weight between entangled 
and non-entangled states. In our calculation we chose 
A = vP z > Glt , where u is uniformly (Lebesgue) random 
on [0, 1] and /3z,gh is an adjustable distortion parame- 
ter chosen to ensure a 50% probability of entangled or 
non-entangled states, Pz = 0.66 and (3gh — 0.50. 

How different are the pure and mixed priors as far as 
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quantifying entanglement is concerned? FIG (2] shows the 
posterior distributions after just a 1000 measurements 
for three different states, with pure and mixed GH prior 
respectively (the plots for the mixed and pure Z distri- 
butions are very similar). We find that in every case the 
"mixed" curve gives results very close to the correspond- 
ing "pure" curve, even when the measurements are still 
far from sufficient for reliable entanglement quantifica- 
tion (as we will see in Section IV). This indicates that 
the choice of "pure" Z or GH measures is at least some- 
what robust against certain simple modifications. 



B. MULTIPARTITE ENTANGLEMENT 
MEASURES 

As mentioned, the system we are particularly inter- 
ested includes four qubits, which is computationally af- 
fordable but sufficiently complicated as a step towards 
scalable multipartite systems. Despite the intensive stud- 
ies in the multipartite entanglement [9 20J over the years, 
almost all attempts at categorizing multipartite entan- 
gled states consider first pure states, and the entangle- 
ment measures for pure states can then be extended 
to mixed states through a convex roof extension, but 
this involves an arduous minimization over all possible 
decompositions of the mixed states. To illustrate our 
ideas without getting too involved in any numerical op- 
timizations, we choose to extend an easily calculable and 
thereby desirable measure, namely, negativity [21] , to the 
four qubit system. The negativity originated from the 
idea of the partial transpose [22] . As is well known by 
now, for 2 x 2 and 2x3 systems that negative partial 
transpose (NPT) is a necessary and sufficient condition 
for entanglement [23] . The negativity has been shown to 
be closely associated with the fidelity of quantum tele- 
portation [24] and its logarithm bounds the amount of 
entanglement that can be distilled [25]. The major ad- 
vantage of the negativity is that it is directly computable 
for both pure states and mixed states regardless of the 
size of the system, e.g., the number of qubits, as long as 
the density matrix is given. 

Suppose we have a quantum system consisting of mul- 
tiple subsystems. We can partition the subsystems into 
two groups, say X and Y. The negativity of a state p 
with respect to that partition X — Y , is defined as 

M X - Y (p) = \\p rY \\i-l, (11) 

where Ty stands for partial transpose with respect to 
subsystem Y and 1 1 • 1 1 1 for the trace norm of a matrix. 
For four-qubit systems there are two ways of partitioning 
into groups of certain sizes: "2 — 2" (partitioning the 
four qubits into two groups of two qubits) and "1 — 3" 
(partitioning them into one group of three and one single 
qubit). Correspondingly we define two negativities as the 



geometric means: 

A/" 2 _2 = {Nab-cdMac-bdNad-bc) 1,Z , (12) 

■A/1-3 = {Na-bcdNb-cdaNc-dabNd-abc) 1 ^ , 

(13) 

where 

Nab—cd — \\p^ CD \\i — 1) (14) 

and similar for all others. For simplicity we henceforth 
denote N2-2 by N\ and we use N 2 for A/"i_3. 

Despite the fact that these negativities can be com- 
puted regardless of the system, they do not necessarily 
make a distinction between all different types of four- 
party entanglement. For instance, both measures may 
be nonzero for states that are not genuinely four-party 
entangled (e.g., a state like [pab ® Pcd + Pa® Pbcd}/2, 
where pab, Pcd, and pbcd are entangled); and it may 
be zero for certain entangled states, namely those with 
the property that for at least one partition the entangle- 
ment is bound. 

On the other hand, both Ni and are entanglement 
monotones since each single Nab-cd or Na-bcd is an 
entanglement monotone, as shown in Ref. [21]. More- 
over, a vanishing N\ or N%, or equivalently a positive 
partial transpose (PPT) indicates nondistillability with 
respect to the corresponding partition |26j . 

Whereas for generic states Ni and N 2 are correlated to 
a high degree (FIG[T]), an illuminating counter-example 
(showing the independence of the two measures) is the 
Smolin state [27], given by 

P = J(|*+).4S(* + |® |* + )CD(* + | 
+ |$ + > AB ($+|® |$+> CD ($+| 

+\*~)ab{*-\®\*-)cd{*-\), (15) 

where 

\* ± ) = ^(\oi)±\m, 

|$ ± ) = i=(|00)±|ll)). (16) 

For the Smolin state, Ni = (it's separable along any 
2-2 cut) and N 2 = 0.5 (it's entangled along any 1-3 cut). 
More specifically, Nab-cd = Nac-bd = Nad-bc = 0, 
Na-bcd = Nb-cda — Nc-dab = Nd-abc — 0.5. 
The evaluations of the entanglement reflect perfectly 
what is shown in Ref. [27], that for the Smolin state, en- 
tanglement can be distilled between any one of the four 
qubits and part of the rest of the three qubits, while there 
is no entanglement between any two groups of two qubits. 

C. SIC-POVM AND THE INVERTED STATE 

For no particular reason we will assume we measure, on 
each single qubit, a class of tomographic POVMs that is 
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symmetric informationally-complete (the so-called SIC- 
POVMs), where any pair of two outcome vectors has ex- 
actly the same overlap. A single qubit SIC-POVM is 
formulated as [Ml 



1 



a)(a\, 



o 



1,2,3,4. 



(17) 



They are linearly independent, tomographically complete 
and satisfy the normalization condition 



Q = l 

and the symmetry condition 

Tr(n a n^) = 



(18) 




(19) 



The four-qubit POVM measurement we refer to is 
the tensor product of the SIC-POVM on individ- 
ual qubits so that only local measurements are per- 
formed. We label the nonorthogonal compound basis as 



M jkmn ,j,k,m,n = 1,2,3,4 and M jkmn = IT, ® n fe ® 

H m <S> II„ . The linear independence and the completeness 
of Mjkmn's can be inferred from the same properties of 
the n a 's for a single qubit system. This makes it possi- 
ble to expand arbitrary density matrices in terms of the 



M 



jkmn • 



jkmn 



n fc n ri 



n„ 



(20) 



Note that the coefficients qjkmn here can be negative 
without compromising the positivity of p. In fact, in 
order for p to be an entangled state, at least one of them 
must be negative (otherwise, Eq. (20) gives a separable 
form). With the help of Eq. Q we are able to tomo- 
graphically reconstruct the state by setting the proba- 
bilities equal to the measurement frequencies Pjkim and 
then expressing the coefficients qjkmn 's in terms of the 
probabilities Pjkmn s: 



Qjkmn 



6 Pjkmn 6 I ^ ^ Pctkmn ~\~ ^ ^ Pjpmn ~t~ ^ Pjk^n ~\~ ^ ^ 



PjkmS 



+6 2 I ^Pa/3mn +^Pak 7 n + ^ PakmS + Pj^n + ^^PjfimS + ^ Pj 



k-yS 



a/3 



01 



PS 



+ l. 

\ /?7<5 cyyS a/35 ap^y J 



(21) 



In an actual experiment where the readout frequencies 
fjkmn are considered as Pjkmn, the state reconstructed 
by Eq.(j2l| with p jkmn = fjkmn is called ptomo, which is 
equal to pmle if and only if ptomo is physical (see [3]). 
For the case where it is not physical, pmle can be ap- 
proximated by setting the negative eigenvalues of ptomo 
equal to zero, followed by a renormalization of the density 
matrix. 



D. METROPOLIS-HASTINGS 

The Metropolis-Hastings algorithm (or MH), among 
many Monte Carlo methods, is applied to generate a 
Markov chain of states to obtain directly the posterior 
distribution over states (hence a "walk"). The MH walk 
is known for its fast convergence even when the sampling 
space is too large for direct random sampling to be effi- 
cient. Since the state space of mixed four-qubit states is 
already 255-dimensional, it makes sense for us to use this 



method. 

The algorithm starts at any random (physical) state 
and decides to take (or not take) the following random 
step each time towards a new state depending on the 
relative likelihood of the new and the old state. The 
process lasts until a converging distribution is reached 
from the steps taken. The overall outcome is a path in 
the state space towards the region with the most likely 
states and wandering about that region. One then counts 
how often a certain state occurs; that is its weight in the 
posterior distribution. More precisely, the probability of 
taking a step is determined by the ratio of the likelihood 
of the next and the current state. For example, if the 
likelihood of the next state is 0.7 times the likelihood 
of the current state, then there is a chance of 70% the 
next state is accepted. On the other hand, if the next 
state more likely than the current state, i.e. the ratio 
of the likelihood is larger than 1, then the acceptance 
is definite. Since the MH walk spends most of its time 
on the most likely states, it manages to outperform pure 
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FIG. 3: (Color online) A Z-prior based Metropolis-Hastings 
walk towards p(q = 0.8) (Eq.{23|), for which JVi = 0.3875, 
N2 — 0.3339. The number of measurements is 10 4 . 

random sampling substantially. 

One of the concerns in MH walk is setting the appro- 
priate step size, from one state to the next. It can be 
defined in a certain chosen measure as 

^step — ||Pnext /^current 1 1 • (^^) 

A small step size may costs a long time for the algorithm 
to converge, although still faster than random sampling, 
while a large step size tends to identify less likely states 
by getting stuck in a low likelihood region, which then 
produces a less accurate distribution. 

In standard practice the acceptance rate, which is de- 
fined as the overall probability of accepting a step, is 
used as a quantitative reflection of a step size. There 
is no rigid proof of what an optimal acceptance rate is, 
as the final distribution converges to a smooth one. In 
our work we tested a wide range of possible step sizes, 
balancing the stability and the efficiency of the program, 
and managed to keep it between 35% and 40%, close to 
the ideal acceptance rate for Gaussian target distribution 
[29] . As shown in FIG{3j the algorithm quickly navigates 
to the desired area after about 1,000 steps and stays there 
"indefinitely" until we terminate the procedure after 10 5 
steps. 

IV. HOW MANY MEASUREMENTS? 

In order to examine how many measurements suffice 
for a reliable report of the amount of entanglement in 
terms of the negativities, it is enlightening to study states 
that are unlikely to be mistaken as separable states. We 
choose a particular class of four-qubit states, namely W 
states with white noise mixed in, which can be charac- 
terized by 

p(q) = q\W)(W\ + (l-q)l/16, (23) 

where \W) = \ (|0001) + |0010) + |0100) + |1000>) and 1 
is the 16-by-16 identity matrix. \W) is known to pos- 
sess genuine multipartite entanglement [12 , and such 
genuine entanglement can be detected and distinguished 



from 3-party and 2-party entanglement, as demonstrated 
recently in an actual experiment [30] . According to the 
entanglement monotones given earlier in the paper, p be- 
comes 2-2 separable (i.e., N\ — 0) when q < 0.1112 and 
1-3-separable (i.e., N 2 = 0) when q < 0.1262. When 
a sufficiently large q value is chosen, the state is less 
likely to be confused as a separable one. Indeed, the 
similarities shared between our results for the states 
p(q = 0.8,0.6,0.4) suggests that the conclusions from 
these three test states can be validly applied to the class 
of states with a wide range of q values as long as the state 
is safely entangled. 

In the spirit of Bayesian estimation, the posterior dis- 
tributions are determined by both the observation data 
and the prior, with the former becoming more and more 
important as data accumulates. When the posterior dis- 
tributions resulting from the two inherently distinct GH 
and Z priors are laid together, we expect that they will 
overlap more and more as a function of the number of 
measurements. Indeed such behavior is demonstrated in 
FIG. [4] and FIG. [5] and this behavior forms the basis of 
our Criterion 1. In particular, FIG. [4] shows the evolu- 
tion of the Bayesian posterior distributions as the num- 
ber of measurements M increases along 10 4 , 10 5 to 10 6 . 
The expectation values (Ni^) and the standard errors 
{25Ni^) are computed and shown in FIG[5] on a loga- 
rithmic scale for the two priors and different numbers of 
measurements. Both (25Ni : 2)z and (8Ni^)gh are fitted 
with 1/M - 5 (see Appendix). On the other hand, in the 
Figure, \{Ni <2 )z ~ (Ni i2 )gh\ is fitted with M~ a , where 
a is approximately 0.81 for Ni and approximately 0.66 
for N 2 . The behavior of \(Ni^)z ~ (Ni. 2 )gh\ is ana- 
lyzed analytically in the M — > 00 limit in the Appendix, 
with several important simplifying assumptions made. It 
shows that for any not-too-pathological prior, the aver- 
age posterior value of a physical quantity N approaches 
the true value N T as 

\(N) - N t \ ~ 1/t/M, (24) 

when M is large. When any two priors are considered 
with the same observation data, the difference between 
the average posterior values of N behaves like 

\(N) z -(N) GH \~l/M, (25) 

which converges faster by a factor of order VM. This is 
because the uncertainty in the data affects each value of 
(N) for each prior in the same linear fashion, and hence 
this uncertainty is canceled out when the difference is 
taken. This observation leads directly to our first Crite- 
rion. 

For entanglement quantification to be reliable we re- 
quire a number of measurements M such that for M and 
larger number of measurements, we have 

Criterion 1 : 

\(N) Z -(N) GH \ < (SN) Z + (6N) GH . (26) 
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FIG. 4: (Color online) Estimated probability distribution 
P(iVi) of MLE & bootstrap method (dot-dashed red) against 
the posterior distributions with Z and GH priors (solid blue 
and solid green respectively), after the same series of mea- 
surements. The broadest, the medium and the sharpest dis- 
tribution for each color correspond to M = 10 4 , 10 s , 10 6 . The 
state being considered is p(q — 0.6) (Eq.( |23[ )). The red curves 
are obtained by assuming pmle is the real state, from which 
the corresponding measurements are simulated and a Pmle is 
found for each set of measurements, thus not requiring any 
prior. Around M ~ 10 5 measurements all three methods 
more or less agree with each other. 



(Obviously, one can always substitute's one favorite mea- 
sure of entanglement instead of N to create a new crite- 
rion. To repeat, our choice of the negativity is for numer- 
ical convenience, as well as the fact our measure can be 
easily generalized to any number of qubits.) This means 
the peaks of the two distributions are closer to each other 
than their mean standard error. When the two priors 
are well chosen to be sufficiently distinct, the difference 
likewise is, presumably, sufficiently large to be spotted. 
As the measurements accumulate, the distribution will 
converge towards the true value. And when Eq.(26l is 



satisfied, we claim that the measurements suffice to be 
trusted and the posterior distribution from either of the 
priors qualifies as the final result. 

According to the Appendix we can write \(N)z — 
(N)ch\ = A/M in the large M limit, where A is a con- 
stant. We also write (SN)z,gh — Bz,gh/VM, where 
Bz,gh are constants, as indicated by the fitting. Then 



Eq.(|26) is satisfied for VM > A 2 /{B Z + B GH ) 2 . This 



is observed when the number of measurements is larger 
than about 10 5 (FIG. p]). Therefore 10 5 is the number 
of the SIC-POVM measurements necessary, according to 
Criterion 1, for an honest assessment of the amount of 
entanglement in terms of negativities in a four-qubit sys- 
tem. 

Note that both of the two priors used in this paper 
are easily generalized to larger number of qubits or other 
higher dimensional systems. The inherent difference be- 
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FIG. 5: (Color online) The difference between the estimations 
of {Nip) using the Z and GH priors, compared with the stan- 
dard error (5N 1>2 ) for p(q = 0.6) (Eq.(23|). The dash-dotted 
lines [connecting the sizes of the error bars] are fitted with 
c/M ' 5 , where c is a number different for the left (Ni) and 
right (N2) figures. The dashed lines [connecting the differ- 
ences between the two estimates] are fitted to guide the eye 
with c/M 81 (left) and c/M om (right) respectively. (Figures 
for p(q = 0.8) and p(q = 0.4) look very similar, except for the 
differences in the slopes of the fitting (dashed) lines.) Our 
Criterion 1 is formulated in terms of the average of (25Ni) 
and {2SN2), such that the location where the dashed and the 
dash-dotted lines cross indicates the number of measurements 
needed for reliable entanglement estimation. 



tween the two, which is observed in terms of the negativ- 
ities for two qubits and four qubits (FIG[T]), is expected 
to persist in similar quantities for larger systems. As a 
result, the proposed criterion can be extended to multi- 
qubit systems straightforwardly. 

Our next criterion compares estimates of entanglement 
based on MLE with a Bayesian estimate, using a prior 
P (either GH or Z). For entanglement quantification to 
be reliable we require a number of measurements M such 
that for M and larger number of measurements, we have 



Criterion 1.5 : 

KAOp-AWI < (SN) P 



6N1 



MLE 7 



(27) 



where P stands for either Z or GH. It is, of course, 
safest (i.e., most conservative) to employ both priors, and 
pick the larger value of M as sufficient. According to the 
Appendix, (N) p and A/mle approach each other at the 
rate of l/M. The argument that |N M le - N T \ - l/\/M 
can be used to imply that SNmle ~ l/VS, since what 
p r is to pmle is exactly what pmle is to all p's that 
constitute the bootstrap distribution. Therefore, similar 
to Criterion 1, a number of measurements M can always 
be found for Criterion 1.5 to be satisfied. 

In words, the criterion accepts an estimate of entan- 
glement as reliable if the Bayesian estimate, based on 
some prior P, and the MLE estimate (using the boot- 
strap method) agree with each other. It's only half a 
criterion, as a Bayesian should see no reason to accept 
the MLE estimate as judge for his estimate; nor should a 
frcquentist accept the Bayesian estimate with some ran- 
domly picked prior for that purpose! It is presumably 



a good criterion for agnostics (and in that case, not in- 
dependent of the first Criterion, as MLE will agree with 
both Bayesian estimates only if the latter agree with each 
other). 

As Fig.Q shows, the bootstrap results [for our par- 
ticular state tested] bear a greater deal of similarities 
with the Gii-based posterior distribution than with the 
Z-based posterior. Thus the latter determines the criti- 
cal value of M. For this particular case, one finds once 
again that M m 10 5 is necessary for reliable entangle- 
ment quantification. Thus, here both Criteria agree with 
each other. 



V. CONCLUSIONS 

We formulated criteria to determine a sufficient num- 
ber of measurements for reliable entanglement quantifi- 
cation. The main criterion uses two different "standard" 
prior distributions over states, used in a Bayesian analy- 
sis of the measurement data. Namely, if the two posterior 
distributions resulting from two different priors agree on 
the amount of entanglement (within error bars) then we 
can declare that our results have converged and, there- 
fore, that they are reliable. A second criterion, not quite 
independent of the first, compares the results from max- 
imum likelihood estimation (MLE), without using any 
prior, to the two Bayesian results. If MLE agrees with 
the two Bayesian estimates, then, again, we can declare 
the results sufficiently reliable. Obviously, in this case the 
two Bayesian estimates must also agree with each other, 
and that is why the second criterion is not independent 
of the first. 

We illustrated these criteria by applying them to a 
particular set of measurements on four qubits [and then 
both criteria agreed with each other on what constitutes 
a sufficient number of measurements], but all our results, 
including the prior distributions, and the measurements 
considered, and the criteria themselves easily generalize 
to more (or fewer) qubits. 

In order to perform these calculations, we also pro- 
posed four-qubit entanglement monotones (based on the 
negativity) that can be calculated for arbitrary mixed 
states. Those monotones, too, generalize easily to differ- 
ent number of qubits. 

In fact, the extcndability of both entanglement mea- 
sures and priors to arbitrary numbers of qubits is the 
principal reason to choose these particular criteria (given 
these ingredients, the criteria then take a standard 
form for distinguishing two (peaked) probability distri- 
butions). 

The next question to be answered is how the suffi- 
cient number of measurements scales with the number 
of qubits. How one can analyze this question when 
the Hilbert space is so large that even the Metropolis- 
Hastings algorithm fails to work relaibly, is the subject 
of a follow-up paper. 



Appendix A: Asymptotic behavior of the 
expectation value of the posterior distribution 

Suppose we are interested in a particular quantity N(p), 
where p is a physical state. Suppose the number of mea- 
surements M is large and the posterior for N, P(N), can be 
approximated by a normal distribution. Then the estimated 
value of N is where the maximum of P(N) is. We have 



dP(N) 



dN 



dlogP(p) 



f^ dPi Y t/ ,i) 



dp 



dN(p) 
dp 



dp 



0, 



provided that dN/dp is analytical in the range of p. 
Recall that 



P(p) = 



Pq(p)C(p) 
Jdp'P (p')£(p') 



where 



C(p) = H Pj (p) F * : 







(Af) 



(A2) 



(A3) 



Pj{p) is the probability of the j'th result to be observed if the 
tested state is p and Fj is the number of times the j'th result 
is actually observed. We can approximate Fj's in terms of 



(A4) 



where p r is the real state and Xj is a normally distributed 
variable with variance f . It means 



Xi = 0, 



I, for every j. 



(A5) 



The bar average Xj, instead of the bracket average as in (N), 
indicates that the average is not taken over an ensemble of 
possible states. Instead, a random Xj value is generated each 
time a measurement record is collected, as M varies. Whether 
or not this average is to be taken depends on the specific 
questions and may be cleared up later. Moreover, Xj and Xk 
are independent of each other except for one constraint: 



We define 



Q(p) = i[Pi(p) PjM 



and 



C M (p) = l[pj(p) 



\/PjiPr)l~l--Pj(Pr)]Xj 



Then the likelihood function becomes 

£(p) = Q M (p)C A f 7 ( P ). 



(A6) 



(A7) 



(A8) 



(A9) 



Note that the subscript in Cm suggests the subtle dependence 
on M through variables Xj's. Hence Cm{p) indeed corre- 
sponds to a single trial correction. However, note that the be- 
havior we study are not limited to a single trial. In fact, as in 
FIGjEJ the red squares that correspond to | {Ni z 2)z — {Ni,2)gh\ 
really come from multiple trials that are affected by different 
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noise profiles Xj's. Eventually the average over multiple trials 
is to be taken and the statistics of Xj's will be applied. 

Since the integral in the denominator is just a constant, the 
zero derivative condition Eq.(All gives 



M dlogQ( P ) + ^ dlogCW(g) + dlogPo(p) 



dp 



dp 



dp 



= 0. 
(A10) 



Not e tha t at large M, whichever p that satisfies Eq,( |Al[ ) or 
Eq.(|AT0| is going to be very close to the actual state p r , which 
is also the maximum of logQ(p), i.e. logQ(p r ) = logQ max . 
We expand logQ(p) around p r up to O ((p — p r ) 2 ): 

log Q(p) ~ log Q{p t ) - i (p - p r ) T ■ d2l ° d g ? (p) • (P - Pr) 

1 T~ 

= L Q -~{p-p x ) a(p-pr), (All) 



where Lq — logQ(p r ) and a = — d logQ(p)/dp | . The 
first derivative term is absent since p r is the local maximum 
and therefore a > 0. This implies 



d log Q(p) T 
dp ="("-^) 



(A12) 



Similar expansion is applied for logPo(p) and log Cm (p) so 
that 



d ^^=/3 T + (p-prf7, 



(A13) 



where /3 T — d log Po(p)/dp| p and 7 = d 2 log Po (p)/dp 2 1 . 

(A14) 



dl ° gCM(p) =C T + (p-p r ) T 57, 



dp 

where C T = dlogC Af (p)/dp| pi and rj = d 2 log C M (p) /dp 2 1 . 
Therefore the zero derivative condition becomes 

- M(p - p r ) T 5 + (C T + (p - p r ) T 5?) 

+/3 T + (p - p r ) T 7 = 0.(A15) 

Solving it for the maximum state: 

Pmax = Pr + Sp, 

where 



5p- 



! 1 =5- 1 C+45~ 1 (/3 + ^5- 1 C) + 



M 



1 



.(A16) 



Now we suppose that iV(p max ) is also where the largest 
probability of N(p) is, which is again assumed to be the ex- 
pectation value, (N). Then 



A'(Pmax) ^ N(p t ) + 



dN 
dp 



■ dp 



where A^ r 



(A17) 



A^(p r ) and A = dN(p)/dp\ . Since a, /3, (, rj 



and A are all fixed and presumably nonzero, the behavior of 
(N) as it approaches its true value N r goes 



KAO-Arl-l/VM, 



(A18) 



in large M limit. 



Note that the first correction in Eq.(A16l, a 1 ^/V~M is 



merely influenced by the fluctuation of the data through £ 
and the shape of the likelihood function through 5, which is 
determined by the measurement setup. It implies a consistent 
behavior with no regard of the choice of the prior distribution. 
Yet the second correction does. We label p max and /3 with 
subscript Z or GH to differentiate the priors. We have 



PZmax — PGflmax = (Pz 



Pgh) 



(A19) 



From the previous analysis we realise that the M-dependence 
in the difference in the state p will carry on to the difference 
in the negativity N. Eq. (A19l indicates 



\{N)z-(N) G h\~1/M. 



(A20) 



Appendix B: Asymptotic behavior of the maximum 
likelihood estimation 



Since we are concerned with the large M region, we assume 
that Pmle is not on the boundary, so that it satisfies 



dlog£(p) 
dp 



= 0. 



(Bl) 



Using the same expansion in the vicinity of the real state p r 
as in the last section, we obtain 



Pmle = Pr 



1 1 v / 1 



.(B2) 



Compared to Eq.( A16 1, the only difference in the higher-order 



terms is that the term containing j3 is missing. When the neg- 
ativity N(p) is considered, similar conclusions can be reached: 



(B3) 



and 



|JV(pmle) - Nr\ ~ 1/VM 



|AT(pmle) - {N) z ,gh\ ~ 1/M. 



(B4) 
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